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ON THE GAP BETWEEN RIP-PROPERTIES AND SPARSE 
RECOVERY CONDITIONS 


SJOERD DIRKSEN, GUILLAUME LECUE, AND HOLGER RAUHUT 


Abstract. We consider the problem of recovering sparse vectors from under¬ 
determined linear measurements via ^^-constrained basis pursuit. Previous 
analyses of this problem based on generalized restricted isometry properties 
have suggested that two phenomena occur if p 7 ^ 2. First, one may need 
substantially more than slog(en/s) measurements (optimal for p = 2 ) for 
uniform recovery of all s-sparse vectors. Second, the matrix that achieves re¬ 
covery with the optimal number of measurements may not be Gaussian (as for 
p = 2). We present a new, direct analysis which shows that in fact neither of 
these phenomena occur. Via a suitable version of the null space property we 
show that a standard Gaussian matrix provides .^g/£i-recovery guarantees for 
^^-constrained basis pursuit in the optimal measurement regime. Our result 
extends to several heavier-tailed measurement matrices. As an application, 
we show that one can obtain a consistent reconstruction from uniform scalar 
quantized measurements in the optimal measurement regime. 


1. Introduction 

Compressive sensing m 13 [II] has established itself in the recent years as a 
rapidly growing research area with various promising signal and image process¬ 
ing applications and beyond, and which has triggered many developments on the 
theoretical side. The theory predicts that (approximately) sparse signals can be ac¬ 
curately recovered from incomplete and perturbed linear measurements. The mea¬ 
surement process is described by a measurement matrix A € with m < n. 

While the naive reconstruction approach via ^o-minimization is NP-hard m , sev¬ 
eral tractable recovery methods have been proposed including basis pursuit (^i- 
minimization), iterative hard thresholding and greedy methods. For all these meth¬ 
ods rigorous recovery guarantees have been shown, see [M] for details and further 
references. 

The restricted isometry property (RIP) is a well-established tool to analyze the 
performance of sparse recovery methods. The standard version defines the restricted 
isometry constant of order s as the smallest number 5a such that 

(1) (1 - 5«)|k||^ < ||A:r||2 < (1 + <5 ,)||x|| 2 for all x S 

where Eg is the set of all s-sparse vectors in C" and 11-112 denotes the usual ^ 2 -iiorm. 
If 5s is sufficiently small we say that A satisfies the RIP. If 5s < do for some suitably 
small doj then given measurements y = Ax -\- e with ||e ||2 < £, the ^ 2 -constrained 
£i-minimization program (also known as basis pursuit denoising) 

min ||z||i subject to \\Az — y \\2 < e 
recovers a vector which satisfies 

( 2 ) \\x-x^\\2<s-^/^as[x)i + ^^, 

777 ,-*-/^ 
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where as{x)i = inf||j,||o<s ||i — z\\i is the error of best s-term approximation to x 
in £i. A (scaled) Gaussian random matrix satisfies the RIP with high probability 
provided that 

(3) TO > Cs log(en/s), 

where C > 0 is an absolute constant. This bound is optimal, see also below. 

In certain cases it is of interest to measure the level of noise in £p-norms with 
p different from 2 and to study the corresponding £p-constrained basis pursuit 
denoising program 

(BPDNp) min ||z|li subject to \\y — Az\\p < e. 

The case p = oo appears, for instance, in quantized compressed sensing m, where 
£oo-constrained basis pursuit can ensure consistent reconstruction, see also Section|T] 
below. The program for p = 1 is more robust to outliers than standard basis pursuit 
denoising. Also, when considering random measurement noise, different values of 
p are appropriate depending on the distribution of the noise (see e.g. [TOj.flS]!. 
For example, p = 1 is well-suited for double-exponential noise, whereas p = 2 is 
appropriate for Gaussian noise. 

Previous attempts in analyzing (BPDNp) have used RIP conditions of the form 

(RIPp,q) c||x||g < ||Ax||p < Cllxllg, for all x £ S^. 

It is part of the folklore in compressive sensing that (RIPp^,) implies stable and 
robust recovery via (BPDNp), with an £q-bound on the reconstruction error (see 
[1 [H] for special cases). Unfortunately, all available results on the number of 
required measurements for Gaussian and other random matrices ensuring (RIPp^^) 
scale significantly worse than o when p 1,2. For certain values of p and q, 
there are even negative results available which state that no matrix whatsoever can 
satisfy (RlPp^g) in the optimal parameter regime ([3]). A more detailed overview is 
given below. 

The purpose of this paper is to illuminate the discrepancy between on the one 
hand, the requirements needed for a matrix to satisfy an RIP condition of the form 
(RlPp^q) and on the other hand, the conditions under which one can stably and 
robustly recover any s-sparse (or approximately s-sparse) vector x £ C” from noisy 
linear measurements y = Ax + e via the generalized basis pursuit denoising program 
(BPDNp). Our results show that a study of the statistical properties of (BPDNp) 
via the £q-robust null space property yields better results than via (RIPp^^), both in 
terms of the required number of measurements as well as the allowed distribution 
of the random measurements. In particular, one can achieve stable and robust 
reconstruction with Gaussian random matrices in the optimal parameter regime ([3]) 
for any 1 < p < oo. This result extends to various random matrices with heavier- 
tailed entries such as exponential matrices, see Section[S]for more information. Our 
proof relies on the small ball method developed in [191 US EH EU ■ 

Notation. The usual £p-norm on C" is denoted by ||x||p = (X^J^i 
1 < p < oo and ||x||oo = inaxj=i^..._„ \xj\. We let Bin and Sin. denote the associated 
unit ball and unit sphere, respectively. The expression ||x||o := #{j ■ Xj ^ 0} counts 
the number of nonzero coefficients of x. The expectation of a random variable Z 
is written EZ and the probability of an event E is denoted by P(A). The Lp-norm 
of a measurable function / with respect to a measure p is denoted by ||/||l (p). 
A Rademacher random variable e satisfies P(e = 1) = P(e = —1) = 1/2 and a 
Rademacher sequence is a sequence of independent Rademacher random variables. 
For t £ R, [tj is the largest integer smaller than t and is the smallest integer 
larger than t. Finally, we write A< B ii A < cB for a universal constant c > 0. 
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2. The relation between (RIPp^,) and (BPDNp) 

Let us first summarize the known results on (RIPp g) and its implication for 
sparse recovery via (BPDNp). As is well known and already described above, the 
(RIP 2 , 2 ) property was introduced in compressed sensing by Candes and Tao in [5J[n]- 
They showed that an m x n matrix A scaled by with i.i.d. standard Gaussian 

entries satisfies © with probability 1 — 77 if m > dj, ^(s log(en/s) + log (?7 ^)). If A 
has this property with Ss smaller than a fixed threshold and \\y — Aa :||2 < e, then 
any minimizer for (BPDN 2 ) satisfies an -guarantee of the form 

P - X*\\q < + s1/9-1/2^-1/2^ 

for any 1 < q < 2; the special case g = 2 is stated in ([5]). In particular, if x is 
exactly s-sparse (so as{x)i = 0 ) and e = 0 , then x can be reconstructed exactly. 
Conversely, it is known that m > slog(n/s) measurements are also necessary for 
exact reconstruction of all s-sparse vectors (see e.g. [HI Theorem 10.11]). 

A very similar connection exists between (RIPij) and (BPDNi) |4]. Indeed, the 
adjacency matrix of a random left d-regular bipartite graph with n left vertices and 
m right vertices with probability 1 — ly satisfies an (RIPi 1 ) condition of the form 

(1 — (5)^/^||a:||i < < ||a:||i for all x G Es, 

provided that d = log(en/(sry))] and m > css\og(en/{srf)). As a consequence 
in Theorem 12], if \\y — Ax\\i < e then any minimizer x"^ of (BPDNi) satisfies the 
£i/£i guarantee 

< C'(d)(|CTs(x)i -H 

where C((5) = 0((1 - 2 d)-i) for 5 f 1 / 2 . 

Interestingly, the rescaled adjacency matrix d~^A does not satisfy the (RIP 2 , 2 ) 
condition. In fact, any (RIP 2 , 2 )-matrix with binary entries must satisfy m > 
s^log(en/s) [TTl Theorem 4.6.1]. Conversely, if A is standard Gaussian, mr^l'^A 
cannot satisfy an (RIP 14 ) condition for m ^ slog(en/s) [!]• To see this, one can 
consider x = ei, i = J2i=i where the denote the standard basis vectors. 
Then ||x||i = ||i||i = 1, but ||Ax||i ys||Ai||i. 

The two positive results for p = q = 2 and p = q = 1 have triggered further 
research on (BPDNp) via restricted isometry properties. In [16] it was shown that 
a standard m x n Gaussian matrix with 

( 4 ) TO > log(en/(s(5))-I-log(77“^)J -|-(p—1)2^“^ 

satisfies an (RIPp_2) property for 2 < p < 00 of the form 

(1 - <5)^/^||x|| 2 < Pp ^||Ax||p < (1 -I- (5 )^^^||x|| 2 for all x G S*, 

where /ip = E||G||p and G is a standard TO-dimensional Gaussian random vector. If 
A satisfies this property for sparsity levels s, 2s, 3s with constants ds, S 2 S, Sss small 
enough (see 0 Theorem 1] for a precise statement), then for all x G C” with 
\\y — Axllp < e, any minimizer x'^ of (BPDNp) satisfies an t' 2 /^i-guarantee 

II^E - t‘*|| 2 < s“^/^(Ts(x)i -f —. 

/ip 

In [3] it is shown that the to x n adjacency matrix A of a random left d-regular 
bipartite graph with n left vertices and to right vertices with high probability 
satisfies an (RIPp^p) property 

for all X G Es) 


(l-d)||x||^<d-i||Ax||P<(l + d)||x||P 
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provided that, in the case 1 < p < 2, 

m > Cp{sPS-^ + s4-2/p-P^-2/(p-1)) 

d > Cp{S-hP-^ + 

where Cp, Cp are singular for p I 1 and p t 2, or in the case 2 < p < oo, 

m > p^P6~^sP log^“^(n), 
d>p^P5-^sP-^\ogP-^{n). 

If A satisfies an RIPp_p-property for p > 1, then one can recover all x G C" with 
\\y — Ax\\p < £ via (BPDNp) with an I'p/fi-guarantee of the form 

11 ^ - ^ s'"/P~^(Js{x)i + e, 

see [31 Theorem A.6] for a more precise statement. Interestingly, |3] also proved a 
lower bound on m assuming that the m x n matrix satisfies (RIPp^p). Their result 
[31 Theorem 4.1] essentially shows that one needs at least m > sP measurements for 
p ^ 2, so that the case p = 2 should be considered a singularity. A straightforward 
modification of their argument shows that to satisfy (RIPp^ 2 ) one needs at least 
m > sP^^, so that also the result in [16] (cf. (01)) cannot be improved significantly. 
We leave the verification of this implication to the interested reader. 

To summarize, two important phenomena occur when moving away from the 
familiar (RIP 2 , 2 )- First, one may need to consider different random matrix con¬ 
structions to satisfy an RIP property with the optimal number of measurements. 
Second, the optimal scaling of the number of measurements in terms of the signal 
sparsity may dramatically worsen, especially for p > 2. 

3. Sparse recovery via BPDNp: improved results 

One might think that the two phenomena concerning the (RIPp^^) properties for 
p ^ 2 mentioned above, may carry over to recovery results via (BPDNp) (see e.g. 
[31 US]), in particular, that the minimal required number of measurements depends 
significantly worse than linear on the sparsity. We will now show that rather the 
contrary is true: the scaling in terms of the sparsity generally does not worsen 
if p ^ 2 and, moreover, the optimal recovery results are realized by a standard 
Gaussian matrix. 

Let us note that earlier work already identified a looseness in the relation between 
the classical (RIP 2 . 2 ) and (BPDN 2 ). For example, if A has independent, isotropic, 
log-concave rows, then o is satisfied with high probability if m > c(d)s log^(en/s) 
(cf. P), and the square in the log-factor cannot be removed (cf. [2 Proposi¬ 
tion 5.5]). On the other hand, this matrix still satisfies, with high probability, 
the exact reconstruction property for s-sparse vectors via ^i-minimization in the 
optimal measurement regime m ~ slog(en/s) (cf. [131 Theorem 7.3] - see also 
m for the special case of measurement matrices with i.i.d. Weibull entries). More 
recently, near-matching necessary and sufficient conditions on the moments of the 
i.i.d. entries of a matrix to satisfy the exact reconstruction property (and more 
generally, stable and robust recovery via (BPDN 2 )) in this regime were recently 
derived by the second-named author and Mendelson m- We recover as a special 
case a variation of this (sufficient) result, see Corollary 15.31 below. Such a result 
cannot be proved via an RIP-based analysis since the right-hand side of (RIP 2 , 2 ), 
i.e., 


|]Aa;l|2 < ^11x112 for all x S E, 
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requires either strong concentration properties or a larger number of measurements 
m than the optimal number slog(en/s) (see the discussion in [3T] and Section [5] for 
more details). 

For our analysis we let Xi,..., Xm be i.i.d. copies of a random vector X in 
C", which is defined on a probability space P). Let Pm be the associated 

empirical measure 


Prr 


1 

m 


2=1 


The following observation follows immediately from the proof of Theorem 2.1 in 
m, by replacing the “Chebyshev” bound 




by 

||/|li^(p^) > uPPm{\f\ > u). 


Lemma 3.1. Fix 1 < p < oo. Let F he a class of functions from C" into C. 
Consider 


and 


Q:r{u)= if n\f{X)\>u) 

J 


RmiF)=Esup — y'ej/(V) 
/eP rnf^ 


where (ei)i>i is a Rademacher sequence. Let u > 0 and t > 0, then, with probability 
at least 1 — , 


1 / /1 . 

inf - V \f{X,)\P > up(q^{2u) - -Rm{F) 
m ^' \ u 

i—1 



Consider the following sparse recovery problem: we take m noisy linear measure¬ 
ments of an (approximately) s-sparse signal x, i.e., we observe y = Ax e where 
A G and we suppose that the noise satisfies ||e|lp < e. We aim to recover 

X from y via (BPDNp). For the analysis we recall the following standard notion 
(cf. for instance [HI Definition 4.21]). Given <? > 1, we say that A satisfies the 
£q-robust null space property of order s with constants 0 < p < 1 and r > 0 with 
respect to a norm || • || if for any set S C [n] with jS"] < s and any a: G C", 

Iksllg < + t\\Ax\\. 

If A has this property, then any solution to 

min ||z||i subject to \\y — Az\\ < e 

satisfies, for any 1 < r < g, the reconstruction error bound 
||t - x*\\r < CpS^/’'“Vs(x)i -I- 

with Cp = (1 -I- p)^/(l — p) and Dp = (3 -I- p)/(l — p) when ||e|| < e (cf. [H 
Theorem 4.25]). 

To analyze ^g-robust null space properties, we introduce the cone 

= {x G C" : 3^ C [n], \S\ = s : IjxsIU > ;t^II^S=||i}. 

Note that Tf ,. contains Eg. We use the following observation. 
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Lemma 3.2. Fix 1 < q < oo. Set 


E^={a;GC" : ||a;||o < s, ||a;||, = 1} 

and let D1 be its convex hull. Then I?| is the unit ball with respect to the norm 

["•/»! 1 / 

- E (E-r) > 

where /i,..., I\n/s] form a uniform partition of [n], i.e., 

{s(£ - 1) + 1,, si}, In/s'] - 1, 

{s([n/s]-l) + l,...,n}, i= In/s'], 

and X* is the nonincreasing rearrangement of x. As a consequence, 

C{2 + p-^)Dl 

Proof. We proceed by making straightforward modifications to the proof of [m 
Lemma 3] (see also [531 Lemma 4.5] or [55]), which corresponds to the case q = 2. 

A vector x € Dj can be represented as x = aiXi with > 0, ai = 1 and 
Xi € S(,n, ||a;i||o < s. In particular, ||aii||D| = ||a;i||g = 1. By the triangle inequality 

||a:||Df < ^aiWx^Wu-i = = 1, 

i i 

so is contained in the || • IId 9 -unit ball. To prove the reverse inclusion, suppose 
that ||a;||£)9 < 1. We partition the index set [n] into subsets Si, S 2 , .. .of size s, 
such that Si corresponds to the indices of the s largest entries of x, S 2 to the next 
s ones, etc. Set ai = ||a;sj|q. Then x can be written as 

a;= E o^^{a~^xSi), 

where 

E a* = E 

i 

Clearly, for any ai ^ 0, ||a)”^a;sj|q = 1 and ||a“^a;sJ|o < s, so a; G D|. 

To prove the second statement, fix a; G T}},, fl and write 

(5) IH« = ( '''+E 

iG/i ieh i>3 ' i&Ie 

To bound the last term, note that for each i G le, i > 3, 

< - J E (E^*") ^ E ^* 3 - 




Summing up over £ > 3 yields 


e>3 ieit 


e>2jeh 


Since x G fl Bgn, there is an S' C [n] with |S| = s, such that ||a;s||q > 
p^||a;sc||i. Therefore, 


< 11 .,.II, < AT!||,„||, < 


1-1/9 


. 1/9 


t>2 iele 


ieii 
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where we used that in the worst case S corresponds to s largest absolute coefficients 
of a;. It follows that 

e>3 iele ' ieh 

Since ||a:||q < 1, m implies that < 2 + p □ 

We are now prepared to prove the main result of this article. To keep our 
exposition accessible, we first consider the special case of a standard Gaussian 
random matrix, i.e., a matrix with independent normally distributed entries with 
mean zero and variance one. In Section [S] we generalize our result to a wider class 
of random matrices. 


Theorem 3.3. Let A be an m x n standard Gaussian matrix. Fix 1 < p < oo, 
q>2 and 0 < rj < I. Suppose that 

(6) m > log(en/s) + log(p“^). 

Then, with probability exceeding 1 — p the following holds: for any x S C" and 
y = Ax + e, where \\e\\p < s, any solution x^ to (BPDNp) satisfies 

P - X*\\r < 

m'-'P 

for any I < r < g. 


Remark 3.4. The most interesting case in the above theorem is g = 2. Then the 
optimal scaling m > Cs\og(en/s) implies that with high probability we obtain the 
error bound 

for reconstruction via £p-constrained basis pursuit. 

For q > 2 the scaling ([5]) of m in terms of the sparsity is near-optimal. Indeed, 
it is known [551 P- 213] that for g > 2 and m < n — 1, the Gelfand width of in 
£g satisfies 

Thus, if we want to satisfy \\x — a;^||g < s^/'^~^(Ts{x)i for all x G C", then it is 
necessary (cf. [TH Theorem 10.4]) that or to > Thus, up 

to possibly a logarithmic factor we cannot improve the scaling of to in terms of the 
sparsity in Theorem 13.31 


Proof of Theorem \3.S[ Suppose first that p < oo. As was noted before, it suffices to 
show that with probability at least 1 — ry the .^g-robust null space property of order 
s holds with respect to ^^-norm, with parameters p and t jmflP for some 0 < p < 1 
and r > 0. Let us first observe that it suffices to show that 

(7) P( inf \\Ax\\p>^)>l-p. 


Indeed, if this is true, then with probability at least 1 — r] the following holds: if 
X € satisfies ||Aa;||p < {m^^P /T)\\x\\q then a;/||x||q is not in Therefore, for 
any S C [n] with jS”! < s. 


psilg ^ 


gl-l/g 


Ps-lli < 


P 


Sl-l/q 


Ps^lli + -^\\Ax\\p. 


On the other hand, if x G C" satisfies ||Ax|]p > {rn^/P /T)\\x\\q, then trivially 


p5||,<P||,<^y^||X5c||i 



||Ax||p. 
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To prove © , we write 


inf 


\\Ax\\j 




i/p 


1 1 / 

Eia.,-)!”) ”, 




where denotes the i-th row of A. To apply Lemma 13.11 we estimate the small 
ball probability Qjr and the expected Rademacher supremum Rm {X) for the set of 
linear functions 

Let V = Y^T=i then by Lemma 

Rm{X) = sup {V,x) 

a:GTp%nSfn 

< (2 + p-i)m-i/2E sup {V,x) 

x^Dl 

= (2 + sup {V,x), 

xGE« 

as Uf is the convex hull of Ef. Since any a; S E® satisfies ||a :||2 < s^^'^~^/‘^\\x\\q, 

R^{X) < s1/2-i/ 9(2 + p-i)TO-i/2]E g^p 

XGS2 

Since Xi ,..., Xm are independent standard Gaussian vectors, so is V. Thus, 

E sup {V, x) = w{Y,l), 
xge^ 

the Gaussian width of E^. It is known that 


w{EI) < y/2slog{en/s) + y/s, 
see e.g. [HI Lemma 4], and we can conclude that 

Rm{X) < cs^“^/‘^(2 + p~^)m~^/‘^y/\og{en/s). 

To estimate the small ball probability, note that, since ||x||g < ||a;|| 2 , for any 

X & Sin, 


p(KXi,x)|>u) = p(|(x„^) 


a:||2 

X 


> 


u 


|a^ll2 


|a;||2 

> w) = P(| 5 | > u), 


where 5 is a standard Gaussian real-valued random variable. Therefore, 

Q.f(2u) > P(|5 | > 2u). 

Now pick Utt small enough so that the right hand side is bigger than 1/2, say. Pick 
m large enough so that 


max ■ 


|- 4c(2-kp ^/<iyJ\og{en/s) v^log(2/7?) 'j ^ 
I u^y/m ’ \/2m J “ 


By Lemma [3.II we can now conclude that ([T]) holds with t = 4 ^/p/u*. 
Finally, let p = 00 . Since ||Aa;||iogm < e||Aa:||oo, 

P( inf ||Ax||oo > -) > P( inf ||Aa:||iogm > -) 
VxGT’.ns.r. tJ \x^Tl,nStn ^ tJ 


Thus, in this case the result follows from our proof for p — logm. 


□ 
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4. Application to quantized compressed sensing 

Consider the situation where we quantize noiseless compressed sensing measure¬ 
ments using a uniform scalar quantization scheme. That is, we observe y = Qg{Ax), 
where Qg : K™ —>■ {6Z + 6/2)'^ is the uniform quantizer with bin width 9 defined by 
Qg{z) = {9\_Zi/9\ -l-d/2)™^. Graphically, we divide K™ into hypercubes (or ‘bins’) 
with side length 9 and map Ax to the center of the hypercube in which it resides. 
We view the quantized measurements as noisy linear measurements y = Ax + e, by 
setting e = Qg{Ax) — Ax. Since the bin width of the quantization is 9, we clearly 
have ||e||oo < 6^/2. 

To obtain a satisfactory reconstruction of the signal, we would like to ensure 
that it is quantization consistent. This means that we require that y = Qg{Ax'^). 
If we define 

Bg = {z G R™ : —9/2 < Zi < 0/2, i = 1,..., m}, 

then x"^ is quantization consistent if and only if Ax^ — y G Bg. Thus, we should 
solve the following quantization consistent basis pursuit program 

(QCBP) min || 2 :||i subject to Az — y G Bg. 

This program is strongly related to (BPDNqo) with e = 9/2 (which correspond to 
taking the closure Bg instead of Bg in (QCBP)). In fact, either 1) a minimizer for 
(QCBP) exists, this is then also a minimizer for (BPDNoo), or 2) no minimizer 
exists, in which case every minimizer of (BPDNoo) is quantization inconsistent. In 
particular. Theorem 13.31 implies the following statement. 

Corollary 4.1. Let A be an m x n standard Gaussian matrix and 0 < ly < 1. 
Suppose that 

m > slog(en/s) -I- log(? 7 “^). 

Then, with probability exceeding 1 — y the following holds: for any x G M" and 
quantized measurements y = Qg{Ax), any solution x^ to (QCBP) is a quantization 
consistent reconstruction of x and satisfies the error bound 

P - x*\\2 < s~'^/‘^(Js{x)i + 9. 

Comparing Corollary 14.11 to the performance of the usual basis pursuit denois- 
ing, (BPDN 2 ), we can still reconstruct with the optimal number of measurements, 
but the reconstruction error does not decay beyond (a constant multiple of) the 
quantization precision 9. 

Let us compare to the work in m, where the authors introduced and analyzed 
(BPDNp) with 2 < p < 00 for the purpose of recovering a signal from quantized 
measurements (as described above). They did not obtain a result for p = 00 , but 
the idea is that the reconstruction becomes more consistent as p —>■ 00 . A main 
result in [16] shows the following, via an (RIPp_ 2 )-based analysis. Assume that the 
error vector e consists of i.i.d. U{[—9/2,9/2]) random variables, that is we assume 
that the quantization error is uniformly distributed in each bin (this is called the 
high resolution assumption). With probability at least 1 — , 

l|e||p < £p ■■= 2(p+i)i/p 

This suggests to try to recover x via (BPDNp) with e = Sp. Let A be an to x n 
standard Gaussian matrix with 


(8) 


m> {pslog{eny/p/s) + p\og{ri 
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then with probability at least 1 — 77 , for any x G K." the reconstruction via 
(BPDNp) with y = Qg{Ax) and e = Sp satisfies 

P - ^ 

Compared to Corollarv l4.ll the reconstruction error due to quantization error shows 
decay with p. Note, however, that the value we can take for p is implicitly limited 
by ([5]), and in particular we cannot set p = 00 so that x^ is not guaranteed to be 
quantization consistent. Moreover, when p > 2, the number of required measure¬ 
ments grows faster than linear in the sparsity. In fact, it grows exponentially in p, 
as opposed to the minimal number of measurements needed in Corollary 14.II 


5. Generalization to different distributions 


From the proof of Theorem 13.31 we extract the following statement, which allows 
us to generalize our recovery result (as well as Corollarv l4.ip to a variety of random 
matrices beyond the Gaussian case, while retaining the same (optimal) recovery 
guarantees as for a standard Gaussian matrix. 


Theorem 5.1. Let A be an m x n random matrix with i.i.d. rows Xi,... , 
which are distributed as X. Suppose that for some m* > 0 and jd > 0, 

(9) P[|(X, a;)| > M,] > P for all x G Si^, 

and, ifV = SiXi then for some k > 0, 


s ^2 

E sup {V, x) = e( s\og{en/s), 

where Vf > ... > V* is the nonincreasing rearrangement ofV. Fix 1 < p < 00 and 
q>2. If 


> max • 


^ 2 - 2/9 


,2/52 • 


log(en/s), 


log(p 1) 


}■ 


then with probability at least 1 — p the following holds: for any x G C” and y = 
Ax + e, where ||e||p < e, any solution x^ to (BPDNp) satisfies 

e 


P# I 






for any 1 < r < g. 


To verify the small ball condition (ED, it is often useful to apply the Paley- 
Zygmund inequality 

( 10 ) P(C >t)> 0 < t < EC, 

which holds for any nonnegative random variable C- In particular, if X is a random 
vector with independent, mean-zero entries Ci, ■ ■ ■ which have variance and 
fourth moment bounded by then 

( 11 ) P(|(X,a;)| >t)> ^'" ^ , 0 <t<iT, 

whenever ||x ||2 = 1. We refer to [H Lemmas 7.16 and 7.17] for details. 

Let us now verify the conditions of Theorem 15.11 for some concrete classes of 
matrices. 


Corollary 5.2. Suppose that the rows of A are i.i.d. copies of X, where X is 
• sub-isotropic, i.e., E(X, x)^ > ||x||| for all a; S C"; 
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• 1-subgaussian, i.e., Eexp(i(X,x)) < exp(<^) for all x G C" with ||ai ||2 < 1 
and < G R. 

If ^ log(en/s) + log(77“^) then the conclusion of Theorem \3.S\ holds. 

Proof. We verify the two conditions of Theorem 15.11 To verify ([5]), we use (ITUl) for 
|(X,a:)p to get 

ni(vx)i>.)>MA|!^ 

whenever 0 < u < 1 and ||x ||2 = 1- In the last inequality, we used that X is 
sub-isotropic and subgaussian. 

To verify the second condition, note that by assumption, the random variable 
{Xi,x — y) is 2-subgaussian for any x,y G Therefore V = SiXi is a 

4-subgaussian random vector (see e.g. [Ml Theorem 7.27]). By Dudley’s inequality 
(see e.g. [Ml Theorem 8.23]), 

E sup (D, x) < f 
xe'S'i J{ 


[log(A/'(E2,|| • l] 2 ,u)]^/^ du. 

x&'S'j Jo 

Since for any u > 0 

Ar(E 2 ,ll.ll 2 ,u)< ^ max Ar(B5, || • jU,«) < (en/s)«(l + (2/u))^ 
\S/ Scfn]: |S|<s 

we conclude that 

E sup {V, x) < \/slog{en/s) + \/s f [log(l -b (2/u))]^/^ du < \/s log(en/s). 
a:es? Jo 


□ 


The following result concerns matrices with i.i.d. entries. 

Corollary 5.3. Suppose that X = (^i,... ,^n), with the independent, mean-zero 
and identically distributed as Suppose that for some A > 0 and a > 1/2, 

(12) (Ejer)^'^’' < Ar“, for all 2 < r < logn. 

and that (Qj holds. If 

m > maxj ^J^^^ log(en/s), ^ , (log(n))2“"^|, 

then the conclusion of Theorem 15.11 holds. 

Specializing Corollarv l5.3l to p = q = 2, we obtain a result similar to [Ml Theorem 
A]. Let us compare the two results. On the one hand, our result gives a better 
power in the log(n) factor (2a — 1 versus 4a — 1) and improved (actually optimal) 
dependence on the failure probability y. On the other hand, El Theorem A] does 
not require independence of the fi and needs only a small ball assumption on the 
set of sparse vectors Es (rather than one on the larger set O Se^ used here). 

Proof. We fix the randomness in the Rademacher sequence (e^). The random vari¬ 
ables Vj = fl-re then independent and mean-zero. Since Xij 

satisfies (HD), m Lemma 2.8] shows that if m > (log(n))“’^’^^^“ then for any 
2 < p < log(n) 

{E\Vj\Py/P < e2“-iAVp, 

i.e., the first log(n) moments show subgaussian behaviour. Therefore, (the proof 
of) El Lemma 6.5] shows that 

^ e"“-'Av'slog(en/s). 
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The result is now immediate from Theorem 15.II □ 

Example 5.4. Let ^ be a random matrix with i.i.d. entries Aij. Below we list some 
conditions under which the conclusion of Theorem 13.31 is valid. Note that if we 
measure the reconstruction error in £2 (i.e., (7 = 2 ), then the stated lower bounds 
always coincide with the optimal number of measurements. 

(i) If the Aij are random signs (i.e. Rademachers), then m > log(en/s) + 

log(r 7 “^) is sufficient for the recovery guarantee in Theorem l3.31 This follows 
from Corollary 15.31 with A = 1, a = 1/2 and /3 ,m* universal constants. 

(ii) If the Aij are standard symmetric exponential random variables, then 

m > log(en/s) + log( 77 “^) suffices for the recovery guarantee in The¬ 

orem ESI Indeed, in this case one can apply Corollary 15.31 with A = a = 1 
and take for P, u* universal constants. 

(hi) Suppose that the Aij are distributed as a random variable which has 
probability density function 

P(2;) = |a;|“^}, xGR, 

27 

for some 7 > 1. One readily calculates that 

E|^|P = - 1 -+ 

7 \7 — p —1 p + 1/ 

for p < 7 — 1 and E|^|p = 00 for p > 7 — 1. If we assume 7 > log(n) + 2, say, 
then ^ trivially satisfies the moment bound in Corollary 15.31 with a = 1/2. 
Moreover, if 7 > 5 then = (7 — 1)/ (37 — 9) and = (7 — 1)/ (57 — 25) 
so the Paley-Zygmund inequality (ED implies that ([9]) holds for universal 
constants m*,/ 3 if 7 > 6 , say. In conclusion, if we assume 

7 > max{log(n) -f 2 , 6 }, 

then m > log(en/s)-|-log(? 7 “^) is sufficient for the recovery guarantee 

in Theorem 13.31 

The last example illustrates that only the behaviour of the first logn moments 
of the entries of A is important for our sparse recovery result, the higher moments 
need not even exist. 

To conclude this section, we extend the example of a standard symmetric ex¬ 
ponential matrix (part (ii) of Example 15.41) to matrices with i.i.d. isotropic, un¬ 
conditional, log-concave rows. In particular, we do not assume that the entries 
within a row are independent. Recall that a probability measure p on R” is called 
log-concave if for any (Borel) sets A, B C R." and 0 < d < 1, 

Pl{0A + (1 - 0)B) > fiiAf . 

A random vector Y is called log-concave if its probability distribution is log-concave. 
We call Y isotropic if it is mean-zero and E(y,a:)^ = ||a:||| for all x £ R". We say 
that Y is unconditional if, for any ei,..., e„ £ { — 1,1}, the vector (eiTi,..., SnYn) 
has the same distribution as T. A typical example of an isotropic, unconditional 
log-concave vector T is a random variable uniformly distributed over the unit ball 
of an unconditional norm in the isotropic position. 

We will use the following comparison theorem from [5D] (see also Theorem 2.5 
in uni), which is based on earlier work in [^ . It will allow us to reduce the general 
case of matrices with i.i.d. isotropic, unconditional, log-concave rows to the special 
case of a standard symmetric exponential matrix. 

Theorem 5.5. Let Y be an isotropic, unconditional, log-concave vector in R"^ and 
E be a standard d-dimensional symmetric exponential vector, i.e., its entries are 
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i.i.d. standard symmetric exponential variables. Let H-H be any semi-norm on K.'^. 
Then for any t > 0, 

p[ni>ct] <cp[iii?ii >t], 

where C is a universal constant. 


Corollary 5.6. Let A be anmxn matrix with i.i.d. rows Xi distributed as X, where 
X is an isotropic, unconditional log-concave vector. If m log(en/s) + 

log(? 7 “^), then the conclusion of Theoreni, \d.d\ holds. 


Proof. We verify the conditions of Theoreni l5.ll By a result of Borell (see e.g. 
Proposition 2.14]), X is a sub-exponential vector. In fact, for any p > 1, 

(E|(X,a:)|^’)i/P <pE|(X,a;)| for all x G K”. 


Since X is isotropic, we can apply (ITTO for \{X,x)\^ to get 


P(|(X,a;)| > u) > 


{n{X,x)\^-uy 

E|(X,ai)|4 




whenever 0 < u < 1 and ||a ;||2 = 1. This shows that holds with absolute 
constants u*,/3 > 0. 

To prove the second condition, we define a semi-norm on by 


||B|L = sup 


m 


where the Bi are the m row vectors of S G Since the Xi are unconditional, 


E sup(y,ai) = ^EPIl 


Considered as a vector in R™”, A is isotropic, unconditional and log-concave. The¬ 
orem [53| therefore implies that, 


P[P||,>ct] <CP[||£:||,>t], 

where f is an to x n standard symmetric exponential matrix. As a consequence, 
we have 

nOC nOO 

E||A||,= / P[||A||,>t]dt<C'M P[||f||, >t]dt<E||f||,. 

Jo Jo 

By the proof of Corollary[0|(see also (ii) of Examnle lb.d]) . E ||g||^ < log(en/s), 
which proves the second condition in Theorem 15.II □ 


As was mentioned before, Koltchinskii showed that to > slog(en/s) isotropic, 
log-concave measurements suffice with high probability to recover every s-sparse 
vector exactly via £i-minimization [B Theorem 7.3]. Under the additional as¬ 
sumption that the measurement vectors are unconditional, Corollary 15.61 makes 
this result stable with respect to approximate sparsity and robust with respect to 
measurement noise, while retaining the optimal number of measurements. 


6 . RIP RIP? 

The classical RIP property, (RIP 2 , 2 ), played a major role in the theory of com¬ 
pressed sensing since mis]. It has proved to be an optimal tool to analyze standard 
basis pursuit denoising for subgaussian matrices. It has also been used to show 
that various other random matrices, including structured random matrices, allow 
for uniform sparse recovery via (BPDN 2 ) if one increases the number of measure¬ 
ments with additional logarithmic factors. Nevertheless, it is known that for certain 
ensembles (e.g. subexponential) this logarithmic increase can be avoided, establish¬ 
ing a gap between RIP and sparse recovery conditions. 
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In this work we showed that this gap becomes much more pronounced when 
considering (BPDNp) for p 2. An analysis of this program via an RIP condition 
erroneously suggests that I) the required optimal number of measurements for 
uniform sparse recovery may be much larger than in the case p = 2, especially if 
p > 2, and 2) that one may need to consider random measurements different from 
Gaussian to attain this optimal number. This begs the question; does this mean 
that researchers interested in sparse recovery should stop considering restricted 
isometry properties? In this paper we showed that by proving a lower (RlPp^g)- 
type of bound on an extension of the set of sparse vectors (cf. (O), one can prove an 
optimal recovery result for a large class of matrices, which do not satisfy (RIPp^,) in 
the optimal measurement regime. Thus, it seems the gap between RIP-properties 
and sparse recovery conditions originates in the upper bound of the RIP “for all x G 
Sg, ||Aa;||p < Gil a; 11^” - at least when considering convex optimization approaches 
for recovery. 

To move towards a definitive answer of our question, it would be interesting to 
determine whether similar gaps occur between RIP-properties and sparse recovery 
conditions for other numerical methods. For example, there are several algorithms 
such as iterative hard thresholding and CoSamp for which convergence results are 
currently only known under the (classical) RIP. 
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